Running Head: DISTRIBUTED MODEL OF INFLECTIONAL MORPHOLOGY Frequency Effects in the Processing of Inflectional Morphology: a Distributed Connectionist Account
نویسندگان
چکیده
One source of evidence concerning the representation and processing of inflected words comes from frequency effects. Experiments in Dutch reported by Baayen, Dijkstra and Schreuder (1997) are interpreted as evidence for a dual-route race model in which inflected words are both stored as whole forms and decomposed into stems and affixes. In this paper we report two sets of neural network simulations demonstrating that a singlemechanism, distributed, connectionist model, trained to map from orthographic input to a representation of the meaning and inflectional status of words, can account for the Baayen et al. results without requiring two computational mechanisms. The pattern of frequency effects produced by this single-mechanism model depends critically on including homonymous affixes in the training set, such as the Dutch plural affix –en which also marks verb infinitives. This finding suggests that frequency effects for morphologically complex words can be accounted for in a distributed system, without postulating distinct processing mechanisms of whole-word storage and morphological decomposition. Davis Frequency Effects in the Page 3 Frequency effects in the processing of inflectional morphology: a distributed connectionist account Words that occur more frequently in language are processed more quickly. This is the longest-standing, and most clearly established result in experimental psycholinguistics (for early summaries of research on word frequency, see Howes & Soloman, 1951; Broadbent, 1967; Morton, 1969). This basic result has been replicated in a range of tasks, such as reading aloud, picture naming, semantic or lexical decisions, in a range of languages. The word frequency effect is taken as evidence that the systems involved in language processing respond to basic statistical properties of an individual’s linguistic experience. Furthermore, through careful manipulations of different word properties, it has been possible to observe effects of the frequency of individual morphemes in complex words (Taft, 1979; Sereno & Jongman, 1997, Baayen, Dijkstra & Schreuder, 1997) or effects of the frequency of different meanings of homonymous words (Rodd, Gaskell & Marslen-Wilson, 2002; Borowsky & Masson, 1996) in order to infer which aspects of word frequency are crucial for a particular task. However, despite widespread agreement that word frequency plays an important role in language processing, a variety of accounts have been proposed of the mechanism by which frequency effects arise. It has been proposed that word frequency effects arise since access to the mental lexicon involves a search through a frequency-ordered word list (in the search model of Forster, 1976). Alternatively, it is proposed that changes to the processing properties of the units that represent individual lexical items (such as lowering the threshold for recognition in the logogen model of Morton, 1969 or raising the resting activation level in the interactive-activation model of word recognition proposed by McClelland and Rumelhart, 1981) allows high frequency words to be recognised more quickly. Davis Frequency Effects in the Page 4 Both serial search and logogen accounts share a single common assumption; namely that there is a single lexical unit that uniquely represents each word – that is, they are localist models. Thus effects of the frequency of occurrence can be localised to changes occurring to a lexical unit. In contrast, a recent class of computational account, distributed connectionist models, propose that individual words are represented as a pattern of activation over many active units, with different banks of units representing domains of word knowledge such as orthography, phonology and semantics. Linguistic knowledge in distributed models is represented by the strength or weights of connections that link these banks of units. These connection weights are not pre-determined but are gradually learnt by training the network to translate from one representation to another (e.g. reading aloud involves a translation from orthography to phonology, auditory comprehension involves mapping from phonology to semantics, etc.). During training, a learning algorithm (typically back-propagation – Rumelhart, Hinton & Williams, 1986) adjusts connection strengths to reduce the discrepancy between the network’s output and a target representation. One crucial aspect of distributed connectionist accounts is that they change the interpretation of a distinction that is common to both linguistics and psycholinguistics; the distinction between rule-governed forms and exceptions. In distributed models, a single set of connections is not only able to acquire knowledge of individual lexical items or exceptions, but is also able to extract and apply regularities to new items, showing how these systems can generalise in a seemingly rule-governed way. For example, in generating the past-tense forms of English verbs from their stems, a single system can learn to phonologically translate both regular verbs (jump-jumped, play-played) and irregular verbs (leap-leapt, give-gave) as well as generalising to novel forms (wug-wugged) (Rumelhart & McClelland, 1986; Plunkett & Marchman, 1991; 1993). Thus the distinction between rule-governed forms and exceptions to Davis Frequency Effects in the Page 5 these rules may be an accurate description of the structure of linguistic knowledge but need not imply that there are two underlying mechanisms by which these forms are processed. In the current paper, we explore whether these distributed connectionist models are able to account for the recognition (rather than the phonological transformation) of regularly inflected words. In describing these simulations we will not only consider the capabilities of distributed connectionist networks (that is, whether they can perform the task) but, crucially, we will evaluate the behaviour of these networks by comparison with experimental investigations of the processing of regularly inflected words. In order to conclude that a single-mechanism distribution connectionist model can account for the recognition of inflected words, we require the model to simulate the pattern of behavioural data produced by human participants. Before describing these network simulations, we will therefore review the target empirical phenomena, and their interpretation in more traditional, localist accounts of lexical representation and processing. Frequency effects in the recognition of regularly inflected words Tasks that involve accessing stored lexical information are sensitive to the frequency of occurrence of these units in the linguistic environment. For example, in the lexical decision task, where participants make a speeded response to indicate whether a letter string or word is a real word or a non-word, a typical response time of 600ms for a word that occurs once in a million words of text, will decrease by approximately 10% for an otherwise matched word that occurs with a frequency of 100 occurrences per million words. In logogen-style models (c.f. Morton, 1969), in which words are identified when the activation of the lexical unit exceeds some threshold value, this effect is simulated by assuming that more frequently occurring words have a higher level of resting activation or a lower recognition theshold. In Davis Frequency Effects in the Page 6 either case, more frequent words have a head-start in the recognition process, and will be identified faster. One important theoretical question in this framework is whether the critical logogens that are activated during word identification represent whole-words or individual morphemes. Are inflected words like “tables” stored whole or decomposed into the smaller units “table” + “s”? This issue can be addressed using the frequency effect as a diagnostic. For example, if we consider the words “neck/necks” and “lip/lips”, these words are approximately matched on the frequency of occurrence of the lemma {neck} or {lip} at around 80 occurrences/million words in the CELEX database (Baayen, Pipenbrock & Guilikers, 1995). However, as we might expect (since individuals typically have two lips but only one neck), the frequency of occurrence of singular and plural forms of these two items is very different. The word “neck” occurs many times more frequently than “necks” (word-form frequency, neck = 72/million, necks = 7/million, these frequencies are singular-dominant) whereas the plural “lips” is used much more frequently than the singular “lip” (word-form frequency, lip = 17/million, lips = 61/million, showing that the noun {lip} is plural-dominant). By an account in which there are individual logogens for the singular and plural form of these words (a whole-word storage account), we would expect substantial differences in response times to “lip” and “neck” or to “lips” and “necks”, with the singular of the singular dominant form being responded to more quickly than the singular of the plural dominant form, and vice-versa for plural forms. This pattern of results is illustrated graphically in Figure 1a. Conversely, if both of these words were decomposed into smaller units such that a single lemma unit (for {neck} or {lip}) was activated for both singular and plural forms, we would not expect a difference in response time between singularand plural-dominant nouns, so long as those items are matched on lemma frequency. To build a model that functions in Davis Frequency Effects in the Page 7 this way, we need to invoke an additional process that allows the inflected words “necks” and “lips” to gain access to the appropriate lemma representation. This decomposition process may add to the time required to respond to plural nouns so that although there is no difference between response times to singularand plural-dominant nouns, responses are generally slowed to plural forms. This predicted pattern is illustrated in Figure 1b. Experiments of this form were first reported in English by Taft (1979). Taft observed (Expt 2) that lexical-decision response times showed an effect of lemma frequency for inflected and uninflected words that were matched on word-form frequency, a pattern suggesting decomposition of inflected forms. However, in a further experiment (Expt 3), Taft also observed an effect of word-form frequency on response times to lemma-frequency matched items; a result that suggests that inflected words are stored whole forms. This pattern was interpreted by Taft as evidence for both whole-word and decomposed lexical representations at different levels of the recognition system. However, other authors have criticised the experimental materials used by Taft (1979). For instance, Sereno and Jongman (1997) pointed out that Taft’s items were not matched for word class, and that the groups of words that contained more nouns would be likely to produce faster responses. Furthermore, it is unclear whether all three of the inflections tested by Taft (-ed, -ing and -s) are equally sensitivity to word-form and lemma frequency. Later in the paper we will review the results of experiments in Dutch and Finnish conducted by Bertram and colleagues (Bertram, Laine & Karvinen, 1999; Bertram, Schreuder & Baayen, 2000) which suggest differences in the behaviour of words with different inflectional endings. Davis Frequency Effects in the Page 8 In seeking to replicate the Taft (1979) findings, Sereno and Jongman (1997) tested only nouns and used the same set of materials in both singular and plural form. Since wordform frequency effects should arise in opposite directions for the two forms (as depicted by the interaction in Figure 1a), this effect is less likely to arise from a simple confound in the experimental materials. However, Sereno and Jongman conducted separate experiments with different participants on singular and plural nouns (with both words and non-words either all inflected or all uninflected) such that some participants were tested on a word list consisting entirely of inflected items. These participants could have adopted different response strategies from those tested on a mix of inflected and uninflected items – for instance, it is possible that inflectional affixes would be ignored in making a lexical decision response. Furthermore, since the Sereno and Jongman experiments included only a small number of items (12 per condition) it is possible that a lack of statistical power may be responsible for the null effects of lemma frequency that they report. We will therefore focus on results reported for Dutch nouns by Baayen, Dijkstra and Schreuder (1997). Their experiment tested lexical decision responses to singular and (-en) plural forms of 93 nouns divided into high and low lemma frequency sets. Each of the 100 participants were tested on a mix of singular and plural nouns with inflected and uninflected non-words to ensure that participants must process inflectional endings in making their lexical decisions. Test items were divided into two lists to ensure that participants did not see both forms of any noun, avoiding priming effects between plurals and singulars. Given the size of the data set we can be reasonably sure that this experiment had sufficient power to detect small behavioural effects. The pattern of response times obtained by Baayen et al. is shown in Figure 2. Davis Frequency Effects in the Page 9 As can be seen by comparison of Figure 2 with Figure 1a, Baayen et al.’s (1997) results did not conform to the predictions of a whole-word storage account. Although there was a reliable effect of word-form frequency on response times to the plural nouns, there was no significant difference between response times to the singular form of singularand pluraldominant nouns. Comparing the results shown in Figure 2 with the predictions of Figure 1b shows that this experiment also failed to confirm the predictions of an account based on decomposition of plural nouns. There was a highly significant effect of dominance on response times to plural forms; an effect that would not be predicted by an account in which plural forms are decomposed and recognised on the basis of a shared lemma representation. In summary, the results of the experiment reported by Baayen and colleagues shows a pattern that is a mixture of simple storage and full decomposition accounts. Correspondingly, they interpret these results as being in line with the predictions of accounts of word recognition which postulate that both of these processing mechanisms – whole-word storage and morphological decomposition – are involved in the identification of inflected forms. Dual-route or dual-mechanism models such as these have been proposed by a variety of authors (Carramazza, Miceli, Silveri & Laudanna, 1985; Pinker, 1991; Frauenfelder & Schreuder, 1991), each making different assumptions regarding the relative role of storage and decomposition in lexical processing. For example, Pinker (1991) proposes that all regularly-inflected forms are decomposed during processing and accessed via their stems, except for irregular forms which must be stored as whole forms (such as the irregular plural “mice” in English). In the Augmented Addressed Morphology (AAM) model proposed by Carramazza and colleagues (Caramazza et al., 1985; Caramazza, Laudanna & Romani, 1988), it is suggested that all complex forms that have been recognised previously will have a stored lexical representation and only new or very low frequency plurals (e.g. nouns that have only Davis Frequency Effects in the Page 10 been seen in the singular form) would be decomposed. A careful consideration of these two models would suggest that despite the presence of two processing mechanisms, neither account would predict the exact pattern that was observed experimentally. For instance, although the model proposed by Pinker (1991) could account for finding a lemma frequency effect (or an absence of a word-form frequency effect) for singular nouns it would still predict similar results to the full-decomposition account for plurals – i.e. that there would be no dominance effect. Conversely, the AAM model proposed by Carramazza and colleagues can account for the word-form frequency effect for plurals in two different ways; for high-frequency lemmas, plurals of both singular and plural-dominant nouns would be stored and word-form frequency effects would be observed. Secondly, for low-frequency or unfamiliar plurals (such as for low-frequency, singular-dominant nouns) the need to decompose these items would further increase the size of the dominance effect. Nonetheless, for the singular forms (which do not require any decomposition) the AAM model appears to predict a word-form frequency effect that was not observed experimentally. Part of the problem for both of these accounts is that although they include two processing mechanisms, the recruitment of each of the two routes is a fairly strict ‘either/or’ based on the familiarity or regularity of the target item. In practice, for the experimental materials used by Baayen and colleagues (which are of reasonable frequency and entirely regular), processing in both of these models would be dominated by one of the two available routes (decomposition in the Pinker model, storage in the AAM model). As is apparent from the comparison of Figure 1 and 2, neither of these single mechanism profiles are appropriate for Baayen and colleagues’ data. The Morphological Race Model (MRM – Baayen et al., 1997; Frauenfelder & Shreuder, 1991; Schreuder & Baayen, 1995) proposed to account for these data again Davis Frequency Effects in the Page 11 includes mechanisms of both whole-word storage and morphological decomposition. However, rather than selecting one of the two routes for each type of item, the MRM proposes that the two processing routes ‘race’ against each other, with the output of the winning route (i.e. whichever process is completed faster) accounting for the processing of any particular item. Such a race model provides for a dynamic assignment of items to the two routes, such that whichever process operates more rapidly and efficiently will determine the response time for a particular item. Critically, however, processing is still completed in the non-winning route and the results of the slower route will still influence the behaviour of the model under certain circumstances. In a series of mathematical simulations, Baayen et al. (1997) show that the MRM predicts the correct pattern of results when operating under the following constraints. They propose that the decomposition process operates fairly slowly for –en plurals. For this reason, the majority of the plural nouns are recognised by the faster whole-word route and hence response times for plurals will be primarily determined by word-form frequency (hence the dominance-effect observed for plural forms). Despite the fact that decomposition operates slowly, Baayen and colleagues (1997) still expect that this processing route would correctly analyse plural forms – determining that “lips” is the plural of the singular noun “lip”). They further propose that successful decomposition of plural forms alters the representation of the noun stem by boosting the resting activation of the lexical unit for the stem. In this way, although decomposed processing can not be readily observed in lexical decision responses to plurals (since this route does not win the race and initiate a response), the results of decomposition can be detected in the processing of singulars, since it is the combined frequency of singular and plural forms that determines response times. Davis Frequency Effects in the Page 12 Baayen et al. (1997) interpret the combination of word-form frequency effects for plurals and combined, lemma frequency effects for singulars as evidence that uniquely favours the dynamic combination of storage and decomposition proposed in their dual-route morphological race model. In further work they show that this model (and identical parameter settings) can readily simulate response time data from experiments in which other manipulations of surface frequency and lemma frequency are made. In conclusion, they suggest that not only do other dual mechanism models fail to predict the correct pattern of results but also that (p. 113) “it is difficult to see how these patterns could be understood using monolithic neural nets... modelling in one pass what in our view is a complex multilayered system”. It is this challenge that we address in the current paper, exploring Baayen’s claim that this complex pattern of results can not be simulated using a singlemechanism, distributed connectionist model. Frequency and regularity in distributed connectionist models Previous simulation work has shown that distributed connectionist models trained on a variety of mappings are sensitive to the frequency of particular items presented during training as well as the extent to which components of the input-output mapping are consistent across different items (i.e. regularity). For example, in models of the computation of phonology from orthography (e.g. the Seidenberg & McClelland (1989) model of reading aloud), the phonology of words that occur more frequently in the training set is computed with reduced error. Furthermore, error rates are also lower for items with orthographic neighbours that are pronounced in a consistent way (such as “hint”, “mint”, “splint”, “tint” etc.) than for items that are inconsistent with their neighbours (e.g. “pint”). These two effects interact such that effects of frequency are larger for irregular or inconsistent items and that effects of consistency are larger for low frequency items, a pattern reminiscent of that observed in experimental investigations of reading aloud (Taraban & McClelland, 1987). Davis Frequency Effects in the Page 13 This interaction between frequency and regularity reflects the sensitivity of the network to the frequency of whole forms and the frequency of regular components of those forms. The network learns to associate combinations of letters with speech sounds one word at a time. After each experience of a particular word, network weights are changed so as to capture the relationship between spelling and sound for that word. Items on which the network is trained more often (i.e. words that occur more frequently in the language) will have more opportunity to alter the network’s weights and hence a processing advantage is observed for high frequency words. Importantly, the network learns the orthographyphonology mapping by generalising the spelling-sound correspondences found in individual words. For items that have a consistent relationship between spelling and sound (such as the rhyming set of -int words listed before), training on one item will benefit other words that are spelt and pronounced in the same way. Conversely, exception words (such as “pint”) do not benefit from the influence of their neighbours and must be learnt as whole forms. The network is therefore much more sensitive to the frequency of presentation of exception words (for a more detailed explanation and mathematical treatment of frequency by regularity interactions, see Plaut, McClelland, Seidenberg and Patterson, 1996). Having described how frequency and regularity affect models of reading aloud, we can now consider how these properties may be extended to account for the empirical data of Baayen et al. (1997). However, previous simulations have shown that networks trained on the task of mapping orthography to phonology are incapable of performing lexical decision at a human-like level of accuracy (Seidenberg & McClelland,1989; Besner et. al., 1990). In order to account for behavioural data obtained from the lexical decision task, we require simulations that map from the spelling (or sound) of words to their meanings (c.f. Plaut, 1997; Gaskell & Marslen-Wilson, 1997). Since these distributed networks use essentially the Davis Frequency Effects in the Page 14 same learning algorithms and computational mechanisms we may assume that similar effects of frequency and regularity will be apparent in these mappings between form and meaning. In the mapping between form and meaning, however, systematic regularities between input and output representations are fewer in number; the relationship between the form and meaning at the single word level is essentially arbitrary. A notable departure from this arbitrariness is provided by the presence of morphological units in the input (such as the English plural affix -s which marks the plurality of nouns, or the similarity in meaning of “lip” and “lips” provided by their shared stem). By extension of the principles found in models of reading aloud we would expect that differences in the frequency of morphological components of the form-to-meaning mapping would be reflected in the behaviour of a trained connectionist network. Indeed prior simulations have shown the effect of these regularities in the learning profile (Ruckel & Raveh, 1999), internal representations (Davis, Marslen-Wilson & Hare, 1996; Ruckel & Raveh, 1999) and priming behaviour (Plaut & Gonnerman, 2000) of distributed connectionist networks. Nonetheless, detailed simulations of the effect of word and lemma frequency on the identification of morphologically complex words have not been conducted. In the current manuscript, we report simulations exploring the behaviour of a distributed connectionist model of the processing of regularly inflected words. Our goal was to simulate the results reported by Baayen et al. (1997) on the processing of Dutch singular and plural nouns. Simulation 1 – Modelling Dutch plural morphology Network Architecture All of the simulations reported in this chapter used a standard, 3-layer feed-forward network of units with a sigmoidal activation function (as used by Seidenberg & McClelland, 1989; Plaut & Gonnerman, 2000 and others). To simulate the results of visual lexical decision Davis Frequency Effects in the Page 15 experiments in a distributed network we require a system that maps from a representation of the visual form of a word to a representation of its meaning or semantics. Since many of the critical test items for the network will be inflected with the Dutch plural affix (-en), the network should accommodate nouns that are marked for plurality. For this reason an additional output unit was added to the semantic layer, to be activated in response to nouns presented with the plural affix. As the relationship between word-form and meaning for noun-stems is essentially arbitrary, 500 hidden units were required to map between input and output representations. Despite this large number of hidden units (chosen to improve overall performance and learning time), the network still does not resort to the localist solution of assigning single words to single hidden units (see Bullinaria & Chater, 1995, for further discussion). The architecture of the network is depicted in Figure 3.
منابع مشابه
Running Head : THE MULTIPLE INFLECTION GENERATOR
The next challenge for connectionist models of the acquisition of inflectional morphology (IM) is to increase their generality: across inflectional paradigms, across grammatical classes, and ultimately across languages. We present a new model of IM that draws together elements of several existing connectionist models and which acquires multiple inflectional paradigms across three grammatical cl...
متن کاملLanguage Learning ISSN 0023-8333 Introduction. Beyond the Obvious: Do Second Language Learners Process Inflectional Morphology?
Given that this special issue is devoted to the acquisition and processing of inflectional morphology by second language (L2) learners, the question in the title may appear redundant. However, recent research on first language (L1) and L2 morphological processing has challenged basic assumptions about the status of inflectional morphology in linguistic processing that had long been taken for gr...
متن کاملDissociative neural correlates of semantic processing of nouns and verbs in Chinese - A language with minimal inflectional morphology
Numerous studies using various techniques and methodologies have demonstrated distinctive responses to nouns and verbs both at the behavioral and neurological levels. However, since the great majority of these studies involved tasks employing pictorial stimuli and languages with rich inflectional morphology, it is not clear whether word class effects resulted from semantic differences between o...
متن کاملRules Versus Statistics: Insights From a Highly Inflected Language
Inflectional morphology has been taken as a paradigmatic example of rule-governed grammatical knowledge (Pinker, 1999). The plausibility of this claim may be related to the fact that it is mainly based on studies of English, which has a very simple inflectional system. We examined the representation of inflectional morphology in Serbian, which encodes number, gender, and case for nouns. Linguis...
متن کاملDegrees of grammatical productivity in inflectional morphology
This paper focusses on grammatical productivity as constitutive property of a model of dynamic morphology (in contrast to overlapping static morphology, which is unproductive). Grammatical productivity is located in the potential system of grammar (here exemplified with inflectional morphology) as opposed to type frequency belonging to the level of language as social institution and to token fr...
متن کاملPast-tense Generation from Form versus Meaning: Behavioural Data and Simulation Evidence.
The standard task used to study inflectional processing of verbs involves presentation of the stem form from which the participant is asked to generate the past tense. This task reveals a processing disadvantage for irregular relative to regular English verbs, more pronounced for lower-frequency items. Dual- and single-mechanism theories of inflectional morphology are both able to account for t...
متن کامل